Least Absolute Policy Iteration-A Robust Approach to Value Function Approximation
نویسندگان
چکیده
منابع مشابه
Least Absolute Policy Iteration--A Robust Approach to Value Function Approximation
Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved e...
متن کاملA uniform approximation method to solve absolute value equation
In this paper, we propose a parametric uniform approximation method to solve NP-hard absolute value equations. For this, we uniformly approximate absolute value in such a way that the nonsmooth absolute value equation can be formulated as a smooth nonlinear equation. By solving the parametric smooth nonlinear equation using Newton method, for a decreasing sequence of parameters, we can get the ...
متن کاملLeast-Squares Policy Iteration
We propose a new approach to reinforcement learning for control problems which combines value-function approximation with linear architectures and approximate policy iteration. This new approach is motivated by the least-squares temporal-difference learning algorithm (LSTD) for prediction problems, which is known for its efficient use of sample experiences compared to pure temporal-difference a...
متن کاملa uniform approximation method to solve absolute value equation
in this paper, we propose a parametric uniform approximation method to solve np-hard absolute value equations. for this, we uniformly approximate absolute value in such a way that the nonsmooth absolute value equation can be formulated as a smooth nonlinear equation. by solving the parametric smooth nonlinear equation using newton method, for a decreasing sequence of parameters, we can get the ...
متن کاملRobust Modified Policy Iteration
Robust dynamic programming (robust DP) mitigates the effects of ambiguity in transition probabilities on the solutions of Markov decision problems. We consider the computation of robust DP solutions for discrete-stage, infinite-horizon, discounted problems with finite state and action spaces. We present robust modified policy iteration (RMPI) and demonstrate its convergence. RMPI encompasses bo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEICE Transactions on Information and Systems
سال: 2010
ISSN: 0916-8532,1745-1361
DOI: 10.1587/transinf.e93.d.2555